多分辨率的深度学习方法,例如U-NET体系结构,在分类和分割图像中已经达到了高性能。但是,这些方法不能提供潜在的图像表示形式,也不能用于分解,denoise和重建图像数据。 U-NET和其他卷积神经网络(CNNS)通常使用合并来扩大接受场,这通常会导致不可逆的信息丢失。这项研究建议包括riesz-quincunx(RQ)小波变换,结合1)高阶Riesz小波变换和2)在U-NET体系结构内正交Quincunx小波(两者都用于减少医学图像中的模糊) ,以减少卫星图像及其时间序列中的噪音。在变换的特征空间中,我们提出了一种变异方法,以了解特征的随机扰动如何影响图像以进一步降低噪声。结合两种方法,我们引入了一种用于减少卫星图像中噪声的图像和时间序列分解的混合Rqunet-VAE方案。我们提出了定性和定量的实验结果,表明与其他最先进的方法相比,我们提出的Rqunet-VAE在降低卫星图像中的噪声方面更有效。我们还将我们的方案应用于多波段卫星图像的多个应用程序,包括:通过扩散和图像分割分解图像denoising,图像和时间序列分解。
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
We introduce efficient deep learning-based methods for legal document processing including Legal Document Retrieval and Legal Question Answering tasks in the Automated Legal Question Answering Competition (ALQAC 2022). In this competition, we achieve 1\textsuperscript{st} place in the first task and 3\textsuperscript{rd} place in the second task. Our method is based on the XLM-RoBERTa model that is pre-trained from a large amount of unlabeled corpus before fine-tuning to the specific tasks. The experimental results showed that our method works well in legal retrieval information tasks with limited labeled data. Besides, this method can be applied to other information retrieval tasks in low-resource languages.
translated by 谷歌翻译
视觉感知和语言理解是 - 人类智力的基本组成部分,使他们能够理解和理论对象及其相互作用。对于机器具有这种能力来推理这两种方式来发明新的机器人人类协作系统,这一点至关重要。深度学习的最新进展已建立了视觉场景和语言的单独复杂表示。但是,了解多模式推理的共享环境中两种方式之间的关联仍然是一个挑战。该论文着眼于语言和视觉方式,促进了如何利用和使用神经网络的视觉和语言任务的关键方面的理解来支持推理。我们从一系列作品中得出了这些理解,做出了两倍的贡献:(i)从动态视觉场景中响应语言查询的动态视觉场景的内容选择和构建时间关系的有效机制,并为推理过程准备足够的知识(ii) )新框架通过利用直接从数据或外部先验指导的视觉语言关联来实现神经网络进行推理。
translated by 谷歌翻译
能够创建一个可以与人类就他们所观看的东西进行有意义的对话的系统,这将是一项技术壮举。针对该目标的设置作为视频对话任务表示,要求系统在正在进行的对话框中对问题产生自然话语。该任务带来了伟大的视觉,语言和推理挑战,如果没有适当的表示方案,可以轻松克服支持高级推理的视频和对话。为了应对这些挑战,我们提出了一个新的以对象为中心的视频对话框架,该框架支持神经推理称为成本 - 代表时空中有关对象的对话。在这里,视频中的动态时空视觉内容首先解析为对象轨迹。鉴于此视频抽象,成本维护并跟踪与对象相关的对话框状态,这些对话框在收到新问题后会更新。对象相互作用是动态和条件地推断出每个问题的,并且它们是它们之间关系推理的基础。成本还保留了以前答案的历史记录,这允许检索相关的以对象为中心的信息以丰富答案形成过程。然后,语言生产以逐步进行,进入当前话语,现有对话和当前问题的背景。我们评估了DSTC7和DSTC8基准的成本,证明了其对最先进的竞争力。
translated by 谷歌翻译
卷积神经网络(CNN)已在医学图像分割方面取得了有希望的结果。但是,CNN需要大量的培训数据,并且无法处理姿势和对象的变形。此外,它们的合并层倾向于丢弃重要信息,例如位置以及CNN对旋转和仿射转化敏感。胶囊网络是一种最新的新体系结构,通过用动态路由和卷积步伐替换池层来实现零件整体表示学习的更好的鲁棒性,这在流行任务(例如数字分类和对象细分)上显示了潜在的结果。在本文中,我们提出了一个带有卷积胶囊编码器(称为3DConvCaps)的3D编码器网络,以学习具有卷积层的低级特征(短距离注意),同时用胶囊建模更高级别的特征(远程依赖)层。我们在包括ISEG-2017,Hippocampus和Cardiac在内的多个数据集上进行的实验表明,我们的3D 3DConvcaps网络的表现非常优于先前的胶囊网络和3D-UNET。我们进一步进行了在卷积层和胶囊层的各种配置下在合同和扩展路径的各种配置下进行网络效率和分割性能的消融研究。
translated by 谷歌翻译
在本文中,我们呈现了Bartpho的两个版本Bartpho-symlable和Bartpho-Word,这是第一个为越南语预先培训的公共大规模单声道序列到序列模型。Bartpho使用“大”架构和序列序列去噪的预训练方案,因此特别适用于生成NLP任务。我们开展实验,以将我们的巴特照片与竞争对手MBART进行比较,以越南文本摘要的下游任务,表明:在自动和人类评估中,Bartpho优于强大的基线MBART并改善了最先进的。我们释放巴特诺以促进未来的生成越南NLP任务的研究和应用。我们的Bartpho模型可公开提供:https://github.com/vinairesearch/bartpho
translated by 谷歌翻译
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 1.6$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 1.3B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
translated by 谷歌翻译
Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译